Goto

Collaborating Authors

 tr 2



High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality

Cao, Yong-Syun, Imori, Shinpei, Ing, Ching-Kang

arXiv.org Machine Learning

V arious methods for high-dimensional model selection have been developed in recent years to address situations where the training and test data come from different distributions. When both input and output variables are available in the source (training) and target (test) domains but the target sample size is small, estimates based solely on the target data often suffer from high variance. To improve accuracy, auxiliary estimates from the source domain can be incorporated, along with bias correction to account for domain differences. This transfer learning strategy facilitates more reliable estimation under limited target information (see, for example, Li et al. (2021), Bastani (2021), and Tian and Feng (2022)). However, when test outputs (i.e., target responses) are unavailable, estimation or bias correction involving both domains becomes infeasible, as only inputs (covariates) are observed in the test set.


Golden Ratio Weighting Prevents Model Collapse

He, Hengzhi, Xu, Shirong, Cheng, Guang

arXiv.org Machine Learning

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon theoretically within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation and linear regression. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and generative model performance. Notably, in some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.


Purest Quantum State Identification

Yu, Yingqi, Chen, Honglin, Wu, Jun, Xie, Wei, Li, Xiangyang

arXiv.org Artificial Intelligence

Precise identification of quantum states under noise constraints is essential for quantum information processing. In this study, we generalize the classical best arm identification problem to quantum domains, designing methods for identifying the purest one within $K$ unknown $n$-qubit quantum states using $N$ samples. %, with direct applications in quantum computation and quantum communication. We propose two distinct algorithms: (1) an algorithm employing incoherent measurements, achieving error $\exp\left(- \Omega\left(\frac{N H_1}{\log(K) 2^n }\right) \right)$, and (2) an algorithm utilizing coherent measurements, achieving error $\exp\left(- \Omega\left(\frac{N H_2}{\log(K) }\right) \right)$, highlighting the power of quantum memory. Furthermore, we establish a lower bound by proving that all strategies with fixed two-outcome incoherent POVM must suffer error probability exceeding $ \exp\left( - O\left(\frac{NH_1}{2^n}\right)\right)$. This framework provides concrete design principles for overcoming sampling bottlenecks in quantum technologies.


Addressing Label Shift in Distributed Learning via Entropy Regularization

Wu, Zhiyuan, Choi, Changkyu, Cao, Xiangcheng, Cevher, Volkan, Ramezani-Kebrya, Ali

arXiv.org Artificial Intelligence

We address the challenge of minimizing true risk in multi-node distributed learning. These systems are frequently exposed to both inter-node and intra-node label shifts, which present a critical obstacle to effectively optimizing model performance while ensuring that data remains confined to each node. To tackle this, we propose the Versatile Robust Label Shift (VRLS) method, which enhances the maximum likelihood estimation of the test-to-train label density ratio. VRLS incorporates Shannon entropy-based regularization and adjusts the density ratio during training to better handle label shifts at the test time. In multi-node learning environments, VRLS further extends its capabilities by learning and adapting density ratios across nodes, effectively mitigating label shifts and improving overall model performance. Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.


Tight bounds on Pauli channel learning without entanglement

Chen, Senrui, Oh, Changhun, Zhou, Sisi, Huang, Hsin-Yuan, Jiang, Liang

arXiv.org Artificial Intelligence

Entanglement is a useful resource for learning, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize separable states, measurements, and operations between the main system of interest and an ancillary system. These algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. We prove a tight lower bound for learning Pauli channels without entanglement that closes a cubic gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ rounds of measurements. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for characterizing Pauli noise.


Analytic theory for the dynamics of wide quantum neural networks

Liu, Junyu, Najafi, Khadijeh, Sharma, Kunal, Tacchino, Francesco, Jiang, Liang, Mezzacapo, Antonio

arXiv.org Artificial Intelligence

Parameterized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.


Modelling and Explaining Legal Case-based Reasoners through Classifiers

Liu, Xinghan, Lorini, Emiliano, Rotolo, Antonino, Sartor, Giovanni

arXiv.org Artificial Intelligence

This paper brings together two lines of research: factor-based models of case-based reasoning (CBR) and the logical specification of classifiers. Logical approaches to classifiers capture the connection between features and outcomes in classifier systems. Factor-based reasoning is a popular approach to reasoning by precedent in AI & Law. Horty (2011) has developed the factor-based models of precedent into a theory of precedential constraint. In this paper we combine the modal logic approach (binary-input classifier, BLC) to classifiers and their explanations given by Liu & Lorini (2021) with Horty's account of factor-based CBR, since both a classifier and CBR map sets of features to decisions or classifications. We reformulate case bases of Horty in the language of BCL, and give several representation results. Furthermore, we show how notions of CBR, e.g. reason, preference between reasons, can be analyzed by notions of classifier system.


Shrinkage Estimation of Higher Order Bochner Integrals

Utpala, Saiteja, Sriperumbudur, Bharath K.

arXiv.org Machine Learning

We consider shrinkage estimation of higher order Hilbert space valued Bochner integrals in a non-parametric setting. We propose estimators that shrink the $U$-statistic estimator of the Bochner integral towards a pre-specified target element in the Hilbert space. Depending on the degeneracy of the kernel of the $U$-statistic, we construct consistent shrinkage estimators with fast rates of convergence, and develop oracle inequalities comparing the risks of the the $U$-statistic estimator and its shrinkage version. Surprisingly, we show that the shrinkage estimator designed by assuming complete degeneracy of the kernel of the $U$-statistic is a consistent estimator even when the kernel is not complete degenerate. This work subsumes and improves upon Krikamol et al., 2016, JMLR and Zhou et al., 2019, JMVA, which only handle mean element and covariance operator estimation in a reproducing kernel Hilbert space. We also specialize our results to normal mean estimation and show that for $d\ge 3$, the proposed estimator strictly improves upon the sample mean in terms of the mean squared error.


Observing Interventions: A logic for thinking about experiments

Barbero, Fausto, Schulz, Katrin, Velázquez-Quesada, Fernando R., Xie, Kaibo

arXiv.org Artificial Intelligence

This paper makes a first step towards a logic of learning from experiments. For this, we investigate formal frameworks for modeling the interaction of causal and (qualitative) epistemic reasoning. Crucial for our approach is the idea that the notion of an intervention can be used as a formal expression of a (real or hypothetical) experiment. In a first step we extend the well-known causal models with a simple Hintikka-style representation of the epistemic state of an agent. In the resulting setting, one can talk not only about the knowledge of an agent about the values of variables and how interventions affect them, but also about knowledge update. The resulting logic can model reasoning about thought experiments. However, it is unable to account for learning from experiments, which is clearly brought out by the fact that it validates the no learning principle for interventions. Therefore, in a second step, we implement a more complex notion of knowledge that allows an agent to observe (measure) certain variables when an experiment is carried out. This extended system does allow for learning from experiments. For all the proposed logical systems, we provide a sound and complete axiomatization.